Expanding Training Sets with Unlabeled Samples by Learned Attributes
نویسندگان
چکیده
Designing generalizable classifiers for visual categories is an active research area and has led to the development of many sophisticated classifiers in vision and machine learning [9]. Building a good training set with minimal supervision is a core problem in training visual category recognition algorithms [1]. A good training set should span the appearance variability of its category. While the internet provides a nearly boundless set of potentially useful images for training many categories, a challenge is to select the relevant ones – those that help to change the decision boundary of a classifier to be closer to the best achievable. So, given a relatively small initial set of labeled samples from a category, we want to mine a large pool of unlabeled samples to identify visually different examples without human intervention. To expand the boundary of a category to an unseen region, we propose a method that selects unlabeled samples based on their attributes. The selected unlabeled samples are not always instances from the same category, but they can still improve category recognition accuracy, similar to [4, 5]. We use two types of attributes: category-wide attributes and example-specific attributes. The categorywide attributes find samples that share a large number of discriminative attributes with the preponderance of training data. The example-specific attributes find samples that are highly predictive of the hard examples from a category the ones poorly predicted by a leave one out protocol. We demonstrate that our augmented training set can significantly improve the recognition accuracy over a very small initial labeled training set, where the unlabeled samples are selected from a very large unlabeled image pool, e.g., ImageNet. Our contributions are summarized as follows: 1. We show the effectiveness of using attributes learned with auxiliary data to label unlabeled images without annotated attributes. 2. We propose a framework that jointly identifies the unlabeled images and category wide attributes through an optimization that seeks high classification accuracy in both the original feature space and the attribute space. 3. We propose a method to learn example specific attributes with a small sized training set, used with the proposed framework. We then combine the category wide and the example specific attributes to further improve the quality of image selection by diversifying the variations of selected images. Without modeling the sample distribution and human involvement in the loop, we achieve to find and add samples to categories by attributes, which are helpful for recognition. For more detailed description of the approach and more results, please refer to our main conference version titled “Adding Unlabeled Samples to Categories by Learned Attributes.”1
منابع مشابه
Multi-class Co-training Learning for Object and Scene Recognition
It is often tedious and expensive to label large training data sets for learning-based object and scene recognition systems. This problem could be alleviated by semi-supervised learning techniques, which can automatically select more training samples from unlabel data for reducing the cost of labeling. In this paper, we proposed a multi-class co-training learning method of two different views f...
متن کاملDiscovery of Informative Unlabeled Data for Improved Learning
In computer vision, the acquisition of sufficient labeled data for training is often time-consuming. However, unlabeled data are conveniently available. The key problem is to discover and incorporate those informative and confidently predicted unlabeled data into the training set for improved learning. In this paper, we discover such unlabeled data by exploiting the locality property of the dat...
متن کاملExploring Early Classification Strategies of Streaming Data with Delayed Attributes
In contrast to traditional machine learning algorithms, where all data are available in batch mode, the new paradigm of streaming data poses additional difficulties, since data samples arrive in a sequence and many hard decisions have to be made on-line. The problem addressed here consists of classifying streaming data which not only are unlabeled, but also have a number l of attributes arrivin...
متن کاملCBC: Clustering Based Text Classification Requiring Minimal Labeled Data
Semi-supervised learning methods construct classifiers using both labeled and unlabeled training data samples. While unlabeled data samples can help to improve the accuracy of trained models to certain extent, existing methods still face difficulties when labeled data is not sufficient and biased against the underlying data distribution. In this paper, we present a clustering based classificati...
متن کاملGraphConnect: A Regularization Framework for Neural Networks
Deep neural networks have proved very successful in domains where large training sets are available, but when the number of training samples is small, their performance suffers from overfitting. Prior methods of reducing overfitting such as weight decay, Dropout and DropConnect are data-independent. This paper proposes a new method, GraphConnect, that is data-dependent, and is motivated by the ...
متن کامل